Mismatch string kernels for discriminative protein classification
نویسندگان
چکیده
منابع مشابه
Mismatch string kernels for discriminative protein classification
MOTIVATION Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. RESULTS We introduce a class of string kernels, calle...
متن کاملMismatch String Kernels for SVM Protein Classification
We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the protein classification problem. These kernels measure sequence similarity based on shared occurrences of -length subsequences, counted with up to mismatches, and do not rely on any generative model for the positive training sequences. We compute the ke...
متن کاملString Kernels with Feature Selection for SVM Protein Classification
We introduce a general framework for string kernels. This framework can produce various types of kernels, including a number of existing kernels, to be used with support vector machines (SVMs). In this framework, we can select the informative subsequences to reduce the dimensionality of the feature space. We can model the mutations in biological sequences. Finally, we combine contributions of s...
متن کاملAccuracy of String Kernels for Protein Sequence Classification
Determining protein sequence similarity is an important task for protein classification and homology detection. Typically this may be done using sequence alignment algorithms, yet fast and accurate alignment-free kernel based classifiers exist. Viewing sequences as a “bag of words”, we test a simple weighted string kernel, investigating the effects of k-mer length, sequence length and choice of...
متن کاملText Classification using String Kernels
We propose a novel approach for categorizing text documents based on the use of a special kernel. The kernel is an inner product in the feature space generated by all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2004
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/btg431